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Abstract 

Independent Component Analysis (ICA) recently has attracted much attention in 
the statistical literature as an attractive and useful alternative to elliptical models. 
Whereas fc-dimensional elliptical densities depend on one single unspecified radial den- 

£N) sity, however, k- dimensional independent component distributions involve k unspecified 

$—1 component densities. In practice, for a given sample size n and given dimension k, this 

makes the statistical analysis much harder. We focus here on the estimation, from an 
independent sample, of the mixing/demixing matrix of the model. Traditional meth- 
ods (FOBI, Kernel-ICA, FastICA) mainly originate from the engineering literature. 
The statistical properties of those methods are not well known, and they typically 
require very large samples. So does the "classical semiparametric" approach by Chen 
and Bickel (2006), which is based on an estimation of the k component densities (those 
densities being those of the unobserved independent components). The "double scatter 
matrix" method of Oja et al. (2006) and (2008) requires the arbitrary choice of two 
scatter matrices generally based on estimated higher-order moments which are likely 
to be poorly robust. As a reaction, an efficient (signed-) rank-based approach has been 
proposed by Ilmonen and Paindaveine (2011) for the case of symmetric component 
^. densities; their estimators unfortunately fail to be root-n consistent as soon as one of 

the component densities violates the symmetry assumption. In this paper, using ranks 

L-. rather than signed ranks, we extend their approach to the asymmetric case and pro- 

pose a one-step R-estimator for ICA mixing matrices. The finite-sample performances 

■^ of those estimators are investigated and compared to those of existing methods under 

moderately large sample sizes. Particularly good performances are obtained from a 
version involving data-driven scores taking into account the skewness and kurtosis of 
residuals. Finally, we show, by an empirical exercise, that our methods also may pro- 

. £h vide excellent results in contexts such as image analysis, where the basic assumptions 

/\ of ICA are quite unlikely to hold. 
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1 Introduction 

1.1 Independent Component Analysis (ICA) 

The traditional Gaussian model for noise, where a fc-dimensional error term e is A/"(0, E) can 
be extended, mainly, into two directions. Either the elliptical density contours of the multi- 
normal are preserved, and e is assumed to be elliptically symmetric with respect to the origin, 
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with unspecified radial density /. Or, the independence of the marginals of E ' e is pre- 
served, but their densities fi, ■ ■ ■ , fk remain unspecified, yielding the independent component 
model. In both cases, the distribution of e involves an unknown linear transformation — 
the k x k symmetric positive definite sphericizing matrix E~ ' , that is, k(k + l)/2 parame- 
ters, in the elliptical case, the kx k mixing matrix A (equivalently, the demixing matrix A~ , 
that is, k 2 parameters in the independent component case. The main difference, however, 
is that, while elliptical noise only depends on one nonparametric nuisance, the radial den- 
sity /, independent component noise involves k nonparametric nuisances, the component 
densities /i,...,/&. This makes the statistical analysis of models based on independent 
component noise significantly harder than its elliptical counterpart: for given k and n, for 
instance, estimating A is much more difficult than estimating E. 

In this paper, we focus on the problem of estimating A. Many solutions — FastICA, 
Kernel-ICA, FOBI, ... have been proposed, mostly in the engineering literature; see Sec- 



tion [O] for details. They typically require very large samples, and their statistical properties 
are not always well known. A method based on the availability of two scatter matrices has 
been developed by Oja et al. (2006) and (2008), and involves the somewhat arbitrary choice 
of two scatter matrices, generally based on estimated higher-order moments which are likely 
to be poorly robust and quite sensitive to possibly heavy tails in some of the component 
densities; also, a symmetrization step is required by the method, which is computationally 
quite demanding. A rigorous asymptotic analysis of the problem is provided by Chen and 



Bickel (2006) in line with Bickel at al. (1993) 's classical semiparametric methodology, based 
on tangent space projections. In that approach, the k component densities fi,...,fk need 
to be estimated, which again is very costly, and requires very large sample sizes. 

As a reaction, an efficient rank-based method has been developed recently by Ilmonen 
and Paindaveine (2011), taking into account the invariance and distribution-freeness features 
of ranks in order to bypass the costly step of estimating k densities. The performance of their 
estimators — call them i? + -estimators — is quite good, even under moderately large samples. 
However, they are based on marginal signed ranks, which requires the somewhat restrictive 
assumption that all component densities are symmetric. 

We show how that unpleasant assumption can be avoided, and propose a one-step R- 
estimation procedure based on residual ranks rather than the residual signed ranks used in 
i? + -estimation. We establish the asymptotic root-n consistency and asymptotic normality 
of our /^-estimators, and carefully study their finite-sample performances via simulations. In 
particular, we show how they improve on the traditional methods (FOBI, FastICA, Kernel- 
ICA, and some others), and outperform Ilmonen and Paindaveine's i? + -estimators as soon 
as the symmetry assumption is violated (in which case their estimators are no longer root-n 
consistent), /^-estimation, as well as i? + -estimation, in this context, requires choosing k score 
functions, a choice that in practice may be somewhat difficulty. We therefore describe and 
recommend a version of our method based on data-driven scores, where the skewness and 
kurtosis of component residuals are taken into account. That method is easily implementable, 
and achieves particularly good results. 

Finally, with an application to image analysis, we also show that our method also provides 
good results in situations where the basic assumptions of ICA clearly do not hold. There, 
our /^-estimators are shown to improve, quite substantially, the demixing performances of 
such classical methods as FOBI, FastICA or Kernel-ICA. 



1.2 Notation, identifiability and main assumptions 

Denote by X^ := (xj n)/ , . . . , X&*)', n G N, with X< n)/ := {X%\ ..., X% ] ), i = 1, . . . , n, a 
triangular array of observed £;-dimensional random vectors satisfying 

Xi n) =Ai + Azl n) (1.1) 

where Z^ ra ^ := (Z^ , • • • , Z„ )' is an unobserved n-tuple of i.i.d. £;-dimensional latent vec- 
tors Zl := (Zl£, . . . , Z!£ '), i = 1, ... ,n, with joint and marginal densities / z and fi,---,fk 
such that 

/ Z (z) = II &&)> * = (*i, . • • , *fc) e M fc . (1.2) 

The fcxl vector // and the k x k full-rank matrix A are parameters; A and its inverse A -1 



are called the mixing and demixing (or unmixing) matrices, respectively. Under (1.2), the k 
components Z\™\ . . . , Z\^' of the latent vectors Tif 1 ' are mutually independent: they are called 
the independent components, and their marginal probability densities / := (/i, . . . , fk) the 



component densities, of the independent component model ( 1.1 )-( 1.2 ). 

Identification constraints clearly are needed in order for // and A to be identified. Without 
any loss of generality we throughout impose that / G J-o, where 

■Fo :={/:= (/i, ••• ,/*) \fj{z) > for all z e R, and | f J (z)dz = l/2 = J f){z)dz]; 
the vector A _1 /i then is identified as the componentwise median of the A _1 X 4 s. Iden- 



tification issues for A are more severe, due to the invariance of the IC assumption (1.1) 



and (1.2) under permutation, rescaling, and sign changes of the centered independent com- 
ponents 7i\ n — A -1 )U. Denoting by Di and D2 two arbitrary full-rank k x k diagonal ma- 
trices, and by P an arbitrary k x k permutation matrix, we clearly have that AZ = A*Z* 



for A* = AD1PD2 and Z* = D^P^D^Z, where Z* still satisfies (1.1) and (1.2). The 



mixing matrices A and A* therefore are observationally equivalent. 

Several identification constraints have been proposed in the literature in order to tackle 
this identifiability issue. Those we are imposing here are borrowed from Ilmonen and Pain- 
daveine (2011). Considering the equivalence classes of k x k nonsingular matrices associated 
with the equivalence relation A* ~ A iff A* = AD1PD2 for some permutation and full-rank 
diagonal matrices P, T)i and D 2 , respectively, denote by II the mapping 

A ^ 11(A) := AD A P A D A (1.3) 

where (a) D A is the k x k positive diagonal matrix whose j th diagonal element is the inverse 
of the Euclidean norm of A's j th column (j — 1, . . . , k), (b) P A is a permutation matrix that 
reorders the columns of AD A in such a way that |(AD A P A )jj| < |(AD A P A )jj| for all j > i, 
and (c) the (not necessarily positive) diagonal matrix D A normalizes AD A P A in such a 
way that (AD A P A D A ) ii = 1, i.e. (D A )^ = (AD A P A )-/ for j = l,...,k. Consider the 
set Ait of nonsingular k x k matrices for which no tie occurs in the definition of P A . Then, 
for Ai, A 2 G Aik, Ai ~ A 2 if and only if II(A 1 ) = n(A 2 ). Each class of equivalence thus 
contains a unique element A such that 11(A) = A, and inference for mixing matrices can be 
restricted to the set M\ := Il(Mk)- 

The matrices A for which ties occur in the construction of P A have Lebesgue measure zero 
in IR fexfc ; neglecting them has little practical implications. While one could devise a systematic 
way to define a unique P A in the presence of such ties, the resulting mapping A 1— y P A would 
not be continuous, which disallows the use of the Delta method when constructing root-n 
consistent estimators for A. 

For L G M.\, denote by 6 = (jt, vecd°(L)) the model parameter, where vecd°(L) stands 
for the vector of size k(k — 1) that stacks the columns of L on top of each with the diagonal 
elements omitted (since, by definition, they are set to one). Write O := (M. k x vecd°(.A/ffc)) 
for the parameter space. Note that, by imposing scaling and some nonnegative asymmetry 
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constraints on the component densities, one could add the (unique) diagonal matrix D A such 
that D A P A D A = P A D A to the list of (nuisance) parameters. In the present context, it is 
more convenient to have it absorbed into the unspecified form of /. The role of D A is quite 
similar, in that respect, to that of the scale functional in elliptical families, as discussed in 
Hallin and Paindaveine (2006). 

Another solution to those identification problems is adopted by Chen and Bickel (2006), 
who impose scaling restrictions of /, and then let their PCFICA algorithm (Chen and 
Bickel 2005) make a choice between the various observationally equivalent values of the 
demixing matrix A -1 . Specifically, they restrict to a parameter space for demixing matrices 
consisting of full-rank k x k matrices that satisfy the following: every row has unit norm, 
the element with largest absolute value in each row is positive, and rows are ordered by their 
maximum element. This parameter space, like Ai\, contains unique representatives from 
equivalence classes amongst observationally equivalent k x k full-rank matrices. 

2 Local asymptotic normality and group invar iance 

2.1 Group Invariance and semiparametric efficiency 

Denoting by P e r f , ?! "]>/• or P„v ec d°(LVf ^ ne J om ^ distribution of X^ under location /z, mixing 
matrix A such that n(A) = L, and component densities / = (fi, ■ ■ ■ , fk), let 

pW := {pg | 9 E 6, / 6 Jo}; Vf ] := {Pg I e G 0} for fixed / G Jo; 

Pg := {P^Ij I L G Ml} for fixed // G R k and / G T ; 

v (n) Qr v (n) ._ |p(n^ | ^ e R> f e jr Q | f or fixed H (A) = L e M^, and 

?tl or Kl ■= i F l% I / e Jo} for fixed n G R k and n(A) = LeM\. 



All those subfamilies will play a role in the sequel. 

A semiparametric (in the spirit of Bickel et al. (1993)) approach to Independent Compo- 
nent Analysis (ICA) and, more particularly, the estimation of A, requires the uniform local 
asymptotic normality (ULAN) of Vi at any / satisfying adequate regularity assumptions: 
see Section 2.2 It is easy to see that ULAN of VV 1 (with parameters fi and L) implies that 
of V™\ (with parameter L) for any given /i6R k . 

The model we are interested in involves the family p( n ). Depending on the context, 
several distinct semiparametric approaches to ICA are possible: either both the location /j, 
and the mixing matrix A are parameters of interest with the density / being a nuisance; 
or the location fj, is a parameter of interest with nuisance (A,/); or the mixing matrix A 
(equivalently, L) only is of interest and (//, /) is a nuisance. Hallin and Werker (2003) have 
shown that, under very general conditions, if the parametric submodels associated with 
fixed values of the nuisance are uniformly locally asymptotically normal (ULAN), while the 
submodels associated with fixed values of the parameter of interest are generated by groups 
of transformations, then semiparametrically efficient inference can be based on the maximal 
invariants of those groups. 

In the present context, A is the parameter of interest, and (//, /) is the nuisance. Con- 
sider / = (/*!, . . . , f*k), and assume that 

(Al) / belongs to the subset J-"ulan of J-'o such that the sequence of (parametric) subfami- 
lies V^t, with parameter L, is ULAN, with central sequence A^ n !(L) (actually, ULAN 
holds at any (p, f) iff it holds at (0, /)), and 

(A2) for all L G M.\ and n G N, the (nonparametric) subfamily V-£ is generated by some 
group of transformations Q^ n \h), o acting on the observation space IR fc ™, with maximal 
invariant R/ n )(L). 

By Hallin and Werker (2003), the semiparametric efficiency bounds (at (//, /), for the problem 
where L is the parameter of interest) can be achieved by basing inference on the maximal in- 
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variant R( n )(L) — more specifically, on the conditional expectation E_( n ) [A|£i(L)| R^L)]; 



M.L;/ 



mr 



since R^ n ^(L) is invariant, that conditional expectation moreover is distribution-free un- 
der "Pf/ (hence, also under densities / that do not necessarily belong to J r VLAN ). 

Section 2.2 establishes the ULAN property (Al) of V™\ for any /j, and / satisfying 
some mild regularity assumptions. Let us show here that (A2) holds for any L G M\ 
and n, and that the maximal invariant is the vector R( n )(L) = (R^ (L),...,R„ (L))', 
where R, (n) (L) = (R^\l),...,B^ ) (L))' and R\f{L) is the rank of (L^xf^- among 
(L^X^),, . . . , (L^X^) — hence also, under V^\ letting 

zMfaL^L-^xW-Ai), i = l,...,n, (2.4) 

the rank of (Z< b) (/i, L)) . among (zS n) (/x, L)) ., . . . , (Z^V L)) .. 

The elements g^ of the generating group Q^(L),o are indexed by the family % of fc- 
tuples h = (hi, . . . , hk) of monotone continuous and strictly increasing functions hj from R 
to R such that lim 2 ^ ±00 ft-j(-z) = ±00, with g h G ^w(L) defined as 

g h : x = (xi, . . . , x'J' = ((xn, . . . , x lk ), ..., (x nl , . . . , x nfc ))' G R fcn H> # h (x) 

where 
<7 h (x) = ^(^((L-^Oi), . . . , ^((L-^Ofc))', . . . , L^^^-^^i), • • • , ^(IT 1 ^)*)')'- 

That is, ^^(L),o is a transformation- retransformation form of the group of continuous 
marginal order-preserving transformations acting componentwise on the L _1 X 4 s. Standard 
results on ranks entail that this group is generating V^ 1 and has maximal invariant R^ n ^(L). 
A similar situation holds when the parameter of interest is (ji, L); similar ideas then lead 
to considering a smaller group Qq (L), with maximal invariant the componentwise signs and 
ranks extending the methods proposed in Hallin et al. (2006 and 2008). This latter approach 



is not needed here, where we focus on /^-estimation of L, but it is considered in Hallin and 
Mehta (2013), who study testing problems for location and regression. 

The approach by Ilmonen and Paindaveine (2011) is quite parallel. However, although 
addressing the problem of estimating the mixing matrix A, so that // is a nuisance, these 
authors do not consider the group (/("^(L), nor the group Qq (L). They rather make the 
additional assumption that the k component densities fj all are symmetric with respect to 
the origin. Under that assumption, they are using yet another group, which is the sub- 
group G™ (L) of C/( n )(L) corresponding to those h G % such that hj(—z) = —hj(z) for 
all j = 1, . . . , k and z G R. The resulting maximal invariant is a vector of component- 
wise signed ranks, that is, the vector of componentwise residual signs, along with the vector 
R?V L) = (R?iV L), . . . , RKV, L))', where Rg(/x, L) = (R^(„, L), . . . , R^fa L))', 
with i?^(/x,L) the rank of |(Z< n) (/i,L) ) .| among | (z£ n) (//, L) ).|, . . . , | (z£° (//, L) ).|. As 
a result, their estimators lose root-n consistency as soon as one of the underlying f/s fails 
to be symmetric with respect to zero — an assumption that hardly can be checked for. 

2.2 Uniform local asymptotic normality (ULAN) 

Establishing ULAN requires regularity conditions on /. The following conditions are suffi- 
cient for / = (/i, . . . , f k ) to belong to J r VLAN . 

(A3) The component densities fj, j = l,...,k, are absolutely continuous, that is, there 
exist k real- valued functions fj such that, for any a < b, fj{b) — fj{a) = j fj(z)dz. 

Letting <p f (z) := ((p/^Zi), ..., <pf k (z k ))', z = (z x , ..., z k )' G lR fc , with ip fj := -fj/ fj, assume 
moreover that 

(A4) all component densities fj admit finite second-order moments, finite information for 

/oo 
z 2 fj(z)dz, 
-oo 
/oo ^oo 

^(^fj^dz, and J f . := / z 2 ip 2 f .(z)fj(z)dz are finite. 
•oo J — oo 



/oo 
zfj(z)dz 
■oo 
/oo 
cp"j.(z)zfj(z)dz, j = l,...,k, also are finite. Consequently, the quanti- 
•00 

ties Jpq(f) := Zf p 8f., ^g(J) := «/,«/„ and ^ OT (/) : = ^/j «/,«/,» are bounded for ev- 



ery j,p,q G {1, ...,/e}. The information matrix for the ULAN result, in Proposition 2.1 
below, depends on these quantities through 

k k 

G f ■= Sfe- 1 )( e i e i® e ^ e i)+ E {7 9P (/)(e P e;®e g e;) + (e p e;®e g e;)} 

jr'=l P,<7=1 

Pt^9 

fc fc 

+ £ e P e 9 ® (««(/) e « e i + ^p(/) e p e p) + J2 6jpq{f) ( e p< ® e i e j) > ( 2 - 5 ) 

r,s=l j,p,q=l 

where e, is the jth canonical basis vector of M fc and ® denotes the Kronecker product. 

Writing I k for the kxk identity matrix, define C := J2 P =i Y^ q =i e p e p® u <3 e g+<5 > > where u g 
is the gth canonical basis vector of ]R fc_1 and e g+< 5 > := S q > p e q+ i + (1 — 5 q > p )e q , with S q > p 
the indicator for q > p. Then, let odiag(M) replace the diagonal entries of a matrix M with 
zeros. Finally, for any m G R fe ( fc_1 ), define matd°(m) as the unique kxk matrix with a 
diagonal of zeroes such that vecd°(matd°(m)) = m. 

Proposition 2.1. Let f E J-q satisfy (A3) and (A4)- Then, f G Fulan, and, for any 
fixed n G M. k , the sequence of subfamilies V^.l, with parameter L G M\, is ULAN with 
central sequence 

^ ) A/ = C(l fc ®L- 1 ) / vec[TW A/ ], where T^, := n'l jr (?,(Z<">)Z<">' - I fc ) (2.6) 



»=i 



where Z 4 := Z' ■ (/i, L) zs defined in (2.4), and full-rank information matrix 



Tl;/ := C(l fc <8> L-^'G/flfc ® L-^C, (2.7) 
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with Gf defined in (2.5). Specifically, for any sequence \J- n > = L + 0(n 2) e M\ and any 



bounded sequence T (n ' G 



dP w 



(n) rz TOfe(fe-l) 



log M ' L( " )+ ^ matd ° (T( " ));/ = rW'AW. „ f - JrW'r L;/ rH + o P (l) (2.8) 

Q %,L(«);/ 

and A^i). ^ — > A/*(fc_i) (0, r L;/ ), as n -»■ 00 wider P^W;/- 

This ULAN property extends that established by Oja et al. (2009) under the additional 
assumption that each component density fj is symmetric. Symmetry for every fj implies 
that the quantities a^ and k^, hence also the quantities %, and Qj pq , all take value zero 
for j,p, q G {1, . . . , A;}; therefore, dropping this assumption of symmetry affects the informa- 



tion matrix (2.7) through Gf in (2.5), which explains why our T^.j differs from theirs. 



2.3 Rank-based versions of central sequences 



The ULAN result from Proposition 2.1 allows the construction of parametrically efficient 
inference procedures on the mixing matrix L G M.\ at any given / and fj,. In practice, 
these two nuisance parameters are not known. In general, misspecifying either or both of 
them leads to invalid inference — tests that fail to reach the nominal asymptotic level and 
estimators that do not achieve root-n consistency. Therefore, the semiparametric approach 
under which both / and // are unspecified is the most sensible one. 

The standard semiparametric approach to the problem is the tangent space project 
method described in the monograph by by Bickel et al. (1993). That approach has been 
taken by Chen and Bickel (2006) and involves estimating the k component density scores — 
consequently, its effectiveness is mitigated in the absence of large sample sizes or in the 
presence of outliers. 

As in Ilmonen and Paindaveine (2011), we consider, instead, the result of Hallin and 
Werker (2003) showing that, under very general conditions, the parametric central sequence 
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conditioned on the maximal invariant mentioned in (A2) is a version (central sequences are 
always defined up to op(l) quantities) of the semiparametrically efficient central sequence 
based on the tangent space projection. Our maximal invariants, however, are not the same. 
Let F : R k — ► [0, l] k and J f : [0, l] k — ► R k be defined so that, for z = (z x , . . . , z k )' G R k , 
F(z) := (Fi(zi),...,F k (z k ))\ with Fj(z,) := JX,fj(z)dz for j = l,...,fc, and, 
for u = ( Ul , . . . , u k )' E [0, l] fc , J y (u) := <p f (F" 1 (u)) = {tp h (Ff 1 ( Ul )) ,...,<p fk (F k l (u k )))' , 
with J f .( Uj ) := ^.(F- 1 ^)) for j = l,...,fc. Writing U, (n) := (U$\...,U$)' 
for U- (p, L) := F f Z ,• (/z, L) J , z = l,...,n, the parametric statistic T^ , defined 



in (2.6) takes the form 



T (n) _ 



-^(j/^^F-^Uf))-!; 



Assume moreover that 



(A5) for all j = 1, . . . , k, z \— > tpf(z) is the difference of two monotone increasing functions. 



Assumption (A5) will be required whenever rank-based statistics with scores iff. o F- x are 

) 



considered. Conditioning Aj™ , on the sigma-field B(L) generated by the marginal ranks 



of the zS n) (/x,L)'s yields 



Aft 



E 



A (n) 



L; m./ 



|B(L) 



(«) 



» 



Cfl^L^'vec^J where T^ :ex := E[T^|B(L)]; (2.9) 



clearly, Aj™l. ex does not depend on /z. Computing this conditional expectation requires 
evaluating, for each j e {1, . . . A;} and r e {!,..., n}, 



E 



Mv$WV&)\ s & ) Q' 



E 



jfMfaMt' 



(2.10) 



and, for each j' ^ j" 6 {1, . . . k} and r, s G {1, . . . , n}, 
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E 



rW\ D-l/rrW'i p( n ) 



(n) 



^(^)^(^)I^(L) = r, i?$(L) = a 



E 



J /,<C 



E 



'^(<) 



, (2.1i; 



where U^J and UW respectively denote, in a sample U\, . . . ,U n of i.i.d. random variables 
uniform over (0, 1), the rth and sth order statistics. As a function of r and s, such quantities 
are called exact scores; they depend on n, and computing them via numerical integration is 
somewhat tedious. 

The so-called approximate scores, in general, are preferable: denoting by 



Ri n) (LI 



$>(L),...,J$(L))':= 



?(n) 



n + 1 



gg go 

n+1 



the (marginal) normalized ranks, the approximate scores corresponding to (2.10 ) and (2.11 ) are 



J /> (fl,«(L))f,r'( S <»»(L))-iyjj ; /' ' 

i=l 
1 



n + U 3 Vn + 1 



and 



^®w)^(?S(L))- E^(^) E^(^i). p-h) 

4 = 1 1=1 



respectively Letting 1*. G lR fc be the fc-dimensional vector of ones, the approximate-score 
version of the central sequence is thus 



Affi^CflfcOL-^vec 



:L;/ 



where 



Tg := odiag[n-i £ (j^C^F"^^)) - J™ F 
with if := I^r=i J/f^ 1 *) ^ d ^ {U) ■= JE^iF" 1 ^!* 



( re )per( n )' 



(2.13) 
(2.14) 



The following proposition, by establishing the asymptotic equivalence between the exact- 



and approximate-score forms (2.9) and (2.13), shows that (2.13) indeed is a version of the 



semiparametrically efficient central sequence for the problem. 

Proposition 2.2. Fix fi G M. k , L G M.\, and f G F ULAN satisfying (A5). Then, under 'P/ff,./, 
W 4S = 4S:cx + °l>(1) a^ («) 4g = Ag* + ^(1), 



as n — >• 00, where A^*y zs a semiparametrically efficient (at L ; /^, and /j central sequence. 
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Consequently, AJ™1 can be used to construct semiparametrically efficient (at /, irre- 
spective of n) estimation procedures for L. Contrary to those based on A^ i%, the R- 
estimators derived from A£l remain root-n consistent, though, under most component 
densities g E J-ulan, 9 ¥" f- And, unlike those proposed by Ilmonen and Paindaveine (2011), 
they do not require / nor g to be symmetric. 

The asymptotic representation for the rank-based central sequence A£l under Pi ]>„ 
where g E Tq is not necessarily equal to / E -Fulan is described in the next proposition. 
If, additionally, g E J-'ulan, the asymptotic distribution for A£\ can be made explicit. For 
every p ^ q E {1, . . . , k}, let 

i; q U,9) ■■= f ^ U {F;\u))if gp (G-\u))du()F;\u)G-\u)du-a fq oc gq 

Jo v o 



1 



and P ; g (f,g) := / F'^u^f^G^u^du / cp fq (F g ~\u))G^(u)du. 
Jo o 

The quantities Jp q (f,g) and pZ q (f,g) are referred to as cross-information quantities; note 
that 7* g (/, /) = 7 M (/) - ft^g(/) and p^(/, /) = 1. Then, define 

Yl. f>g := C (I fc ® L- 1 )' G /i5 (I fc ® L- 1 ) C (2.15) 

where G /)S := £j^=i 7l r (/, #) (e^e^e^) +p^,(/, <?) (e p e' g ®e 9 e;) , and write rj, (/ for rj, f/)/ . 
Remark that T^ * g depends on g only through Jp q (f,g) and Pp q (f,g)- 

Proposition 2.3. Fzx / E F ULAN , /J. E M fc ; and L E .M*; to& Z t (n) := zJ n) (/x,L) de/med 
zn D, /ei J^:= iEr=i J/(G(z! n) )) ™* F^W := i EILi F^G^)) . Then, 



(i) If g eF , AQ = A^ n J. fg + o L 2(l) asn^oo, under P^, where 

A^^C^L^'vec 



rpO(n) 



and 
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TS, 9 := odiag[rH £ {Jf(G(Z^))F-^G(Z^)) - jWp-K»V)] . (2.16) 



i=l 



(nj Suppose furthermore that g G T ULA n, and fix r G M fc *- fe 1 - ) so that L+n 2matd°(r) G A^j. 



fc- 



ITien, AH -A A4( fc -i) fr?.,„T, r* ,) asn^oo, under P (n) ± with T 



/j,L+n~2matd°(-r); 9 L ^' 9 



defined in (2.15). If r = Ok(k-i)> g & J~q is sufficient for this convergence to hold. 



(Hi) If again, g G J^ulan andr G M fe ^ 1 - ) zs as defined in (ii), then, as n — >• oo ; under P^L, 



4i" } __i__,_,,- 4i;/ = -rW- + 0P (i). (2.i7) 



'L+n 2matd°(r);/ 



In Section |3j our i?-estimation procedures require evaluating the /-score rank-based 
central sequence, for / G J^lan, a t a preliminary root-n consistent estimator l/ n ' of L. The 
asymptotic impact of substituting LA n ) for L does not directly follow from Proposition 



2.3 



in 



because the perturbation r in (2.17) is a deterministic quantity. Lemma 4.4 in Kreiss (1987) 
provides sufficient conditions for Proposition |2 .3| (iii) to hold when replacing r with a sequence 
of random vectors, r , n G N. More precisely, if 

(Cla) f (n) = P (1), as n ->■ oo, and 

(Clb) there exists an integer iV < oo so that, for all n > N, f^ n ' can take, at most, a finite 
number of values within any bounded ball centered at the origin in IR^ -1 ), 



hold, then (2.17) is still valid with r replaced by t . 

i4- i o rrsisff- in r>f\tr\ on o~f- onr\ T imnor I- 



Let L/ n ) G Af I be an estimator for L. We say that it is root-n consistent under P^i and 



locally asymptotically discrete if navecd^L^ — L) satisfies (Cla) under PL™l ;s an d (Clb) 



Proposition |2.3[ iii) and Lemma 4.4 from Kreiss (1987) then yield the following corollary. 
Corollary 2.1. Fixji G M fc , L G Afj|. and f^g G T ULAN . Suppose thath^ is root-n consistent 
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under PLl-o an d locally asymptotically discrete. Then, under P^ , as n — >■ oo ; 



4l(i),/ - 4w = -ri, /l9 vecd°(L(«) - L) + o P (l). (2.1? 



The asymptotic discreteness requirement for the preliminary estimator is not overly re- 
strictive. Any root-n consistent sequence l/ n ) := (LrV) € Ai\ indeed can be discretized 
as L^ := (£„ ;# ), with L^ # := (cm) sign(L^) cn^Lf™ | , for r ^ s E {l,...,k}, 
where c > is an arbitrary constant and \x~\ denotes the smallest integer greater than or 



equal to x. The root-n consistency properties of l/™) carry over to L:T which by construction 
is locally asymptotically discrete and, because Ai\ is a compact subset of R fc ( fc_1 ), still takes 
values in M\. 



3 i?-estimation of the mixing matrix 

Assume that a rank test rejects Ho : 6 = 0q against the alternative Hi : 6 ^ 6q for large 
values of some test statistic Qq (R/™) (0 o ) ) measurable with respect to the ranks H^(6 ) 
of residuals Z {n) (6 ) := (zi n) (0 o ), . . . , zi n) (0 o ))', which are i.i.d. if and only if 6 = 9 . The 
original i?-estimator for 6 G ©, as proposed by Hodges and Lehmann (1963), is defined 
as 6^1 : = argmin 0ee Qi n) (R (n) (*))■ 

Even for simple problems such as location, regression, etc. involving a low-dimensional 
parameter 0, minimizing Qg (RA n ) (9) ) is wrought with difficulty — as a function of 0, it is 
piecewise constant, discontinuous, and generally non-convex. In the present case of a k(k—l)- 
dimensional parameter space Ai\, solving this problem typically would require an infeasible 
grid-search in relatively high dimension. 

As an alternative, we consider the one-step i?-estimators described in Hallin, Oja and 
Paindaveine (2006) and Hallin and Paindaveine (2013) that possess advantageous features 
such as expedient computation, straightforward asymptotic properties, and provide a con- 
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sistent estimator for the asymptotic covariance matrix as a by-product. Those one-step 
estimators are computed from a preliminary root-n consistent estimator l/ n ) and the re- 
sulting value A-™^ of the rank-based central sequence associated with some reference 
density / G J'ulan satisfying (A5). 

3.1 One-step /^-estimation 

For fixed / G J-'ulan, assume that 

(CI) there exists a sequence of estimators l/ n ) G M\ of the parameter L G M.\ that are 

both root-n consistent and locally asymptotically discrete, under PLl-o f° r an y A* e ^ fc > 

L G M.\i and g G J 7 ULA n, an d, furthermore, 
(C2) for all p ^ q G {1, ...,&}, there exist consistent (under Vg for every (7 G J-'ulan) 

and locally asymptotically discrete sequences % q (f) and ptJf) of estimators for the 

cross-information quantities Jp q (f,g) and pt q {f,g)- 

For any / G J^lan, the one-step i?-estimator for L G A^jJ. based on /-scores is the k x k 
matrix j}?' G Alj. defined by 

vecd°(L( n) ) =vecd°(LW) +n-|(q (n);/ )" 1 A 1(n);/ , (3.19) 

where r~ (n ) , is a consistent estimate of r^/a- This estimator is constructed by plug- 



ging 7* (/) and ptJf) into (2.15). Under Assumptions (CI) and (C2), f - (n) is a consistent 



estimate of r^ . The procedure for obtaining each estimate 7p 9 (/) and p* pq (f) satisfying 



(C2) is discussed in Section 3.2. The next proposition establishes the asymptotic distribution 
of LjT ; its proof parallels that of Theorem 5.1 in Ilmonen and Paindaveine (2011). 

Proposition 3.1. Fix a reference density f G T ULAN . Then, 
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(i) for any \x G M. k , L G M\, and g G T ULAN , the one-step R-estimator (3.19) is such that 



n 2 vecd° flA 



^(^(O^r^j- 1 ^^*^)- 1 ) (3.20) 



(n) 



as n — )■ oo, under P„ j „, and 



-1 



(mj # moreover, / = #, inen (rj, ;/ J rj, ;/ (Il ;/ J 

parametrically efficient (at f) estimate ofL. 



-i 



(r* ;/ ) \ and lJ 



(") 



is a semi- 



The -R-estimator iA can be written in a form that avoids inverting f ~ {n) , which can 
be numerically singular when estimated in practice. Define therefore the k x k matri- 



l(«) 



jW 



ces ^(i)^ := (««(/) )J^=i and ^^^ := W P g{f)) P , q =i with zeroes on the diagonal and, 
for every p ^ q G {1, . . . , fc}, 



*{?(/) := 



7*(/) 



7*(/)7*(/)-p*(/)p*(/) 



and #»>(/): 



-#,(/) 



pgVJ yrgp\ 



7*(/)7*(/)-p*(/)p*(/) 



pgVJ yrgp\ 



Letting A © B = (a pq b pq ) denote the Hadamard product between two matrices A = (a 



i«i i 



and B = (b pq ) of the same size, define 



r W 



Un)l 



<n) 



K™)' 



<n)> 






(3.21; 



-.(») 



with Tj/y: defined in (2.14). Theorem 5.2 in Ilmonen and Paindaveine (2011) then implies 



that iA can be expressed as 



•(») 



£W + rT*L<»> [NW , - diag(LWNgi) f ) 



LW,/> 



(3.22) 



3.2 Consistent estimation of cross-information quantities 



A critical point in computing iA (|3.22|) is the consistent estimation of the cross-information 



quantities in T^ t. q - To tackle this issue, we exploit the asymptotic linearity (2.17) of Af;(„ 



l >;/ 



using a method first proposed by Hallin et al. (2006) in the context of the i?-estimation of 
a scatter matrix in an elliptical model, and further developed by Cassart et al. (2010) and 
Hallin and Paindaveine (2013). In the present case, we have to consistently estimate a total 
of 2k(k — 1) cross-information quantities appearing in r*L ; /, g . 

Fixing / G J^lan, define, for A > and r ^ s G {1, . . . , k}, the mappings 



A * /^(A) := (Tgi );/ ) rs (T(l ;/ ) rs and A * h*>{\) := && ht ) „<£&.«)„. (3-23) 



;L(");// sr \~~L p , rs ;// sr 



(from K + to IR), where 



Prs 



L(")H-n-h(Tgi )/ ) rs LW(e r e' s -diag(LWe r ea) and 



LV := L W +n-h(T^ )J ) sr L(«)(e r e;-diag(LWe r eO) : 



» 



with Tl j defined in (2.14). Assume, additionally, that 



(C3) for fixed f,g<E J-'ulan, A* £ ^ fc > an d L G .M^, the sequence l/ n ' of preliminary estimators 
(satisfying (CI)) is such that each element inT^ n) is bounded from below by a positive 

(n) 

constant with probability tending to one under P^ [ . More precisely, for all e > 0, 



HM9' 

(n) 



there exist 5 e > and an integer N e such that PLlo (Tf/™)/) > ^ 
all n > N e and r^s6{l,..., £;}. 



> 1 — e for 



This assumption is satisfied by most root-n consistent estimators for the mixing matrix; see 

Section 4 for a discussion. 

The following lemma is adapted from Hallin and Paindaveine (2013). 

Lemma 3.1. Fix f,g G T VLAN , jx G M. k , and L G Ai\. Let l/ n ) be a sequence of preliminary 
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estimators for L satisfying (cl) and (c3). For every r^s€ {1, . . . , k}, the mappings /i 7 * a 



and h Prs defined in (3.23) satisfy, for A > 0, 



h?»{\) = (1 - \lW,9)) (T £M;/ ) r 2 s + o P (l) and h*-{\) = (1 - XpMg)) (T £W;/ £+ o P (l) 
as n — >■ oo, under PLl„- Furthermore, each mapping is almost surely positive for A = 0. 



By Lemma 3.1, the mappings /i 7rs and /i Prs are both positive at A = and, up to op(l)'s 
under Pj^i , are linear with a negative slope. Therefore, intuitively appealing estimators 
for ^ s (f,g) and p* r3 (f,g) would be, respectively, (7%,(/,0)) _1 : = inf A {A G K : h^(X) < 0} 
and (p* rs (f,g)) '■= infA {A G R : h p * s (X) < 0}; estimators for p rs (f,g) would be defined in 
an analogous manner. However, these estimators are not asymptotically discrete. Instead, 
taking Xj = j/c for some large c > and jGZ, let 

(W))- 1 := Kfs + c- 1 ^(A-,J/(^(A-, s ) - ^(A+J), (3.24) 

with A~, s := maXj e z{Aj : /i 7,ts (Aj) > 0} and Ai ,* := min je z{Aj : /i 7 * s (Aj) < 0}. Similarly put 

(pW))' 1 ■= K* rs + c- 1 ^(A; ?s )/(^(A^) - h»H\+J), (3.25) 

with A~» := maxjgg {A^: h p * s (\j) > 0} and A p * := minj g z {Aj: h p * s (\j) < 0}. The estima- 
tors (3.24) and ( 3.25[ ) can be shown, under assumptions (Cl) and (C3), to satisfy (C2) along 



the same lines as in Theorem 5.3 of Ilmonen and Paindaveine (2011). 



3.3 Data-driven specification of reference density 

While the choice of the reference density / has no impact on the consistency properties of 
the corresponding i?-estimator L,™ , it has a direct influence on its performances for both 
finite n and as n — > oo; the "closer" / is to the actual density g, the better the performance 
for L, . The efficiency loss due to a misspecified reference density / is revealed though an 
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inspection of the cross-information quantities. 

Many mixing matrix estimators of L, including those proposed by Chen and Bickel (2006) 
and Bach and Jordan (2002), rely on nonparametric estimates of the underlying component 
densities or scores. However, such nonparametric estimates require large sample sizes to be 
effective due to their sensitivity to tuning parameters such as bandwidth or choice of basis 
functions. For instance, Chen and Bickel (2006) propose estimating score functions using a 
basis of t -B-spline functions; the exact choice of t has a significant impact on the resulting 
estimator. Furthermore, nonparametric methods tend to be sensitive to outliers, especially 
in the case of small to moderate-sized samples. 

The purpose of using the /^-estimators based on /-scores is precisely to increase robust- 
ness against outliers while avoiding nonparametric density estimation. A distinctive feature 
of ranks is that they are independent, under the null hypothesis and hence also under con- 
tiguous alternatives, of the corresponding order statistics. That property can be exploited, 
in the spirit of Dodge and Jureckova (2000), to select a reference density / that accounts for 
features (skewness, kurtosis, etc.) of the actual underlying g: as long as such a selection is 
based on order statistics, it has no impact on the validity of i?-estimation procedures. 

We propose selecting / := (f\, . . . , f k ) by fitting, componentwise, a parametric density 
to the (order statistic of the) residuals associated with the preliminary estimator L*™). If 
skewness and kurtosis are to be accounted for, a convenient family of densities is the family 
of skew t-distribution (Azzalini and Capitanio 2003) with densities of the form 

2 / / ii -f 1 \l/2\ 

h u (x) = — t u (z)T u+ i [az[ -) ) for x G M. and z := cr^ 1 (x — /i), (3.26) 

o \ \v + z 2, / J 

indexed by u> := (//, a, a, is), where \x G K is a location, a G l^j~ a scale, a G K. a skewness 
parameter, and v > the number of degrees of freedom governing the tails; t v (z) and T u (z) 
are the density and cumulative distribution functions, respectively, of Student's t-distribution 
with v degrees of freedom. For each j = 1, . . . , k, an estimator (jlj, cfj, 6ij, i>j) is obtained 
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from the residuals Z[ n • (L^), . . . , Z^j (L^) using a method such as maximum likelihood. 
Then, the /-score functions used in the i?-estimation procedure are those associated with the 
skew t-density h^ , with Uj = (flj, &j, &j, i>j), thus taking into account the skewness, kurtosis 
and tails of the residuals. Data-driven scores, however, clearly need not be restricted to the 
family of skew ^-densities, and can be selected from other univariate parametric families as 
well; in Section [4], we also consider, for instance, the family of stable distributions, indexed 
by ix> := (/i, a, (3, 7), where \i and a are location and scale, (5 is a skewness parameter ((3 = 
means symmetry), and 7 G (0, 2], the tail index, characterizes the tail behavior (7 = 2 means 
Gaussian tails, 7 = 1 Cauchy tails). 

4 Simulations 

Simulation experiments are conducted to examine finite-sample performances of the proposed 
i?-estimation procedure. In the simulations, we evaluate /^-estimators L ? based on various 
preliminary estimators from the literature and a data-driven reference density /, as described 



in Section ^3 In this section, we describe the precise construction of the four preliminary 
estimators to be used, the i?-estimatorL, , and, for the sake of comparison, the i? + -estimator 
of Ilmonen and Paindaveine (2011). Then we describe the simulation experiment setups and 
conclude with a discussion of the simulation results. 

4.1 Preliminary, R-, and i? + -estimators 

4.1.1 The preliminary estimators 

Oja et al. (2006) propose estimating a mixing matrix using two distinct scatter matrices with 
the independent components property. A scatter matrix is a k x k symmetric positive definite 
and affine-equivariant function of a sample of n random fc-vectors. A scatter matrix is said 
to possesses the independent components property if, when the sample of random fc-vectors 
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at which it is evaluated is i.i.d. with mutually independent components, all of its off-diagonal 
elements are 0p(l) as the sample size grows to infinity. The sample covariance matrix is a 
classical example of a scatter matrix exhibiting that property Generally, however, a scatter 
matrix possesses the independent components property only if the mutually independent 
components all possess symmetric distributions, a condition which is not satisfied here. 

As a remedy, Nordhausen et al. (2008) propose generalizing Oja et al. (2006) by con- 
structing the estimator using two distinct symmetrized scatter matrices. Symmetrizing a 
scatter matrix S(-) entails evaluating the same scatter matrix function at the distinct pair- 
wise differences of observations from a given sample. Specifically, if X-[ n , ...,X„ is an 
observed sample of k- vectors, then the symmetrized version of S(-) is defined to be 

a*rv( n ) y(«) , i ■— QfSH 71 ) yW y(") yW y(") y(") ^ (a o7\ 

O lAl ,---,^n i ■" a l A (l,2)' • • ' ' A (l,n)' A (2,3)' ' ' ' ' A (2,n)' A (3,4)' ' ' ■'%-!,«)]' l 4 - Z 'J 

where X;™ ., := X 4 — X^- for each {(i,j) '■ 1 < % < j < n} denotes the n{n — l)/2 distinct 
pairwise differences. If X 4 (i = 1, . . . ,n) is i.i.d. with mutually independent components, 
then X/™^ (1 < i < j < n) is also i.i.d. with mutually independent components each having, 
by construction, symmetric distributions. Consequently, the symmetrized version of any 
scatter matrix S(-) has the independent components property. 

Letting S* and S* denote the symmetrized versions of two distinct scatter matrices S A 
and S B as in (4.27) and letting X-f 1 , . . . , X„ denote an observed sample of fc-variate mixed 



data, Nordhausen et al. (2008) propose an estimator A(S A , S*) that is the kx k nonsingular 
matrix A simultaneously satisfying 

Sl(A- 1 xW...,A- 1 Xi»))=I fc and S* (A" 1 ^, . . . , A^X^) = D, (4.28) 

where D is any full-rank k x k diagonal matrix. In the simulations below, we construct 
preliminary estimators based on the following scatter matrices: the sample covariance 
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1 n 1 n 

S cov := i ^ (X? } - XW) (Xf } - XW)' where X^ := - £ X 4 (n) ; 
the fourth-order scatter matrix 

S COV4 := -J2 ( X ^ -XW) , (S CO v) _1 (X, (n) -XW)(xf» -X'^xf -XW)' 
and the van der Waerden rank-based estimator S H op (Hallin et al. 2006). Letting S^ ov , S 



COV) '-'COV4 5 



and S^ OP denote symmetrized versions of these scatter matrices (4.27), we obtain the prelim- 
inary estimators A Fobi := A(S* ov , S* OV4 ) and A HOPCov := A(S* OP , S* ov ), defined in ( 4.28 ), 
from each sample of fc-variate data generated in the simulations. Often referred to as FOBI 
in the literature, A Fobi was first studied in Cardoso (1989), and is the most usual estimator 
of that type. As for A HOP cov, it can be expected to inherit some of the favorable robustness 
properties of the rank-based S* OP . When computing A Pobi and A HOPC ov, obtaining sym- 
metrized scatter matrices introduces a heavy computational burden; for a sample of size n, 
the symmetrization step requires evaluating a scatter matrix from n(n — l)/2 pairwise dif- 
ferences. Because sample sizes in typical ICA applications are large, A Fobi and A HOPCov may 
be impractical. 

The FastICA algorithm (Hyvarinen and Oja (1997); in the simulations, we used the 
fastICA R package by Marchini et al. (2012), with default settings and the initial demixing 
matrix set to identity) exploits the canonical assumption in ICA that at most one source 
component possesses a Gaussian distribution. A mixing matrix is selected by maximizing a 
contrast function that approximates the sample negentropy sequentially for each component. 
Because negentropy is a measure of non-Gaussianity, the estimated components obtained via 
the FastICA estimator A PIca have both low cross- correlation, and all but one possess empirical 
distributions not well-approximated by a Gaussian. Reyhani et al. (2012) establish sufficient 
conditions for the root-n consistency of A PIca . 

Finally, the Kernel-ICA-KGV or Kernel-ICA algorithm (Bach and Jordan 2003) seeks 
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a demixing matrix that minimizes the mutual information between the implied independent 
components via generalized variance, a construction implicitly measuring non-Gaussianity. 
Of all preliminary estimators we considered, A KIca (computed from the kernel-ica Matlab 
package (Bach 2003) with default settings) has the strongest performances in the simulations; 
its asymptotic properties have not been well studied, though, and conditions for its root-n 
consistency have not been established. 

After evaluating each preliminary estimator (A Fobi , A HO p C ov, A FIca , and A KIca ) from each 
replication, one-step i?-estimators are computed from the observationally equivalent 

L Fobi := n(A Fobi ), L H opcov:=n(A HOPCov ), L FIca := Il(A FIca ), and L KIca := Il(A KIca ), (4.29) 



which belong to M.\ (see (1.3) for the definition of the mapping IT). 



4.1.2 The i?-estimators 



As described in Section 3.3, we used data-driven scores from the skew t-family in the con- 



struction of our /^-estimators. For each replication of X-j™ , . . . ,X„ and preliminary es- 



L" 1 ^ for i 



timator L E M.\, we compute the residuals ZJ (L) := L 1 ~X} i ' 1 ' for i ----- 1, ...,n. For 
each j = 1, . . . , k, a skew t-density h^ i (see (3.26 )) is fit to the n-tuple Z[j (L) , . . . , Z^J (L) 



of jth components via maximum likelihood (MLE). In this implementation, a constrained 
MLE Uj, with aij G [—30, 30] and Oj e [3, oo), was adopted for the sake of numerical stability. 
The resulting one-step i?-estimate then is, with / := (h^, . . . , h& k ), 



L*(L) :=L + n-^ I 



M n) -diagfLN^, 



(4.30) 



L,f 



where N^ n is defined in (3.21) (because L*(L) is based on data-driven scores, no reference 



density is used in the notation). 

In the simulations, we also explore the performance of a multistep version of the it- 



estimator just described. Taking L*(L) as a preliminary, (4.30) indeed is easily iterated: 
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letting TJf Q , (L) := L, define, for t = 1, . . . , T, 

^)£):=L*(l£ M) (L)). (4.31) 

4.1.3 The J? + -estimators 

We also computed the signed-rank i? + -estimators proposed by Ilmonen and Paindaveine (2011), 
the validity of which requires symmetric component densities. The computation of those R + - 
estimators not only requires a root-n consistent preliminary estimator l/ n ) E Ai\, but also 
an estimate for the location // G IR fc . The preliminary estimators we used are those described 



in Section 4^ for location, we adopted the same componentwise median estimator as in 
Ilmonen and Paindaveine (2011). To make the comparison a fair one, however, we also 
implemented the signed-rank procedure on the basis of data-driven scores, as explained in 



Section 4.1.2 — restricting the fit, of course, to symmetric Student or stable densities. The 



resulting i? + -estimators are denoted asL+(L). Finally, parallel to (4.31), multistep versions 



of L+ (L) are easily constructed; the notation L+ (t) (L) is used in an obvious way. 



4.2 Simulation experiments 

In each simulation experiment, bivariate observations (k=2) were generated from various 
generating processes. Each generating process is characterized by a sample size n and two 
component densities, gf^ and g%\ S = A, . . . , L, the list of which is provided in Table [IJ yield- 
ing various skewness levels and tail behaviors. We also consider (E) an asymmetric bimodal 
mixture distribution. Each marginal distribution has median equal to zero (with location 

/ 1 0.5 \ 

parameter set accordingly) and unit scale. The same mixing matrix L = I J G Ail 

V 0.5 1 / 

was used throughout; location, which plays no role, was set to fi = 0. Small (n = 100) and 

moderate (n — 1, 000) sample sizes were considered. 
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Table 1: Component densities used in the simulation experiment, all with median zero and unit scale: (a) skew t (a, v) denotes 
the skew i-density with shape (asymmetry) parameter a and v degrees of freedom; (b) stable(/3, f) denotes the stable density 
with skewness parameter /? and tail index 7; (c) asymMix-t3 is a mixture of two Student t-distributions with 3 degrees of 
freedom; and (d) £„ is the classical Student i-distribution with u degrees of freedom. 



Component densities 

g gf gf 

(A) skew t (a = 5, v = 1) skew t (a = 5, v = 1) 

(B) skew t (a = 5, v = 6) skew t (a = 5, v = 6) 

(C) skew t (a = 5, v = 10) skew t (a = 5, v = 10) 

(D) stable(/3 = 1,7= 1.75) skew t(a = 5,u= 10) 

(E) asymMix-t3 skew t (a = 5, v = 10) 

(F) Student's t\§ skew t (a = 5, v = 10) 

(G) stable(/3 = 1,7= 1.5) stable(/3 = 1,7 = 1.5) 
(H) stable(/3 = 1,7= 1.75) stable(/3 = 1,7 = 1.75) 
(I) stable(/3 = 0, 7 = 1.75) stable(/3 = 0, 7 = 1.75) 
(J) Cauchy t\ Cauchy t\ 

(K) Student's t% Student's t§ 

(L) Student's tio Student's £10 

For each generating process (each combination of n = 100 or 1, 000 and 5* G {A, . . . , L}), 
the number of replications was set to M — 1, 000, and, for each replication, the following 
estimators of L were computed: 



(a) the preliminary estimators L = L Pobi , L HOPCov , L PIca , and L KIca given in (4.29); 

(b) the one-step i?-estimators L* (L) based on the preliminary ones as listed under (a) and 
data-driven skew t-scores; 

(c) the one-step i? + -estimators L\_ (L) based on the preliminary ones as listed under (a) 
and data-driven Student's t-scores. 

For component densities (A), (C) and (H), moreover, we also computed, for n = 100 
and n — 1, 000, 

(d) the T-multistep versions of the .R-estimators based on the preliminary L HO p C ov and L KIca , 
still with data-driven skew t-scores, T = 1, . . . , 10. 
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Figure 1: Boxplots of Amari errors obtained in M = 1000 replications of the setup (n, S), n = 100, 1, 000, S = A, B, C, for the 
preliminary L = Lp t,i, LjjopCovi Lpj ca , Ljcicai the one-step /{-estimator L*(L), and the one-step _R+-estimator L*_(L) based 
on the same preliminaries with data-driven skew t- and Student's i-scores, respectively. 
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Figure 2: Boxplots of Amari errors obtained in M = 1000 replications of the setup (n, S), n = 100, 1, 000, S = D, E, F, for the 
preliminary L = Lp t,;, LjjopCovi Lpjca, Ljcicai the one-step /^-estimator L*(L), and the one-step i?+-estimator L*_(L) based 
on the same preliminaries with data-driven skew t- and Student's i-scores, respectively. 
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The performance of each estimator (L, L* or I/jJ is measured by its Amari error with 
respect to L. The Amari error (Amari et al. 1996) AE(A, B) of a k x k matrix A with 
respect to a nonsingular k x k matrix B is defined as 



AE(A,B) 
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Figure 3: Boxplots of Amari errors obtained in M = 1000 replications of the setup (n, S), n = 100, 1, 000, S = G, H, I, for the 
preliminary L = Lp t,i, LjjopCovi Lpj ca , Ljcicai the one-step /^-estimator L*(L), and the one-step i?+-estimator L*_(L) based 
on the same preliminaries with data-driven skew t- and Student's i-scores, respectively. 
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with W := B _1 A = [wij]. The Amari error (which is not a matrix norm) takes values be- 
tween and 1; AE(A, B) close to indicates higher similarity between A and B. The value 
of AE(A, B) is invariant under permutations and (positive) rescaling of rows or columns 
of A and B. More precisely, AE(A, B) = AE(A*, B*) for any k x k matrices A* := CiAC 2 
and B* := C3BC4 so long as each Cj is an arbitrary product of permutation matrices 



30 



and diagonal matrices with all diagonal entries being positive. Hence, AE(A, B) = im- 
plies n(A) = 11(B), i.e. A and B are observationally equivalent, see ( 1.3[ ). Therefore, the 
Amari error is a natural measure of performance for estimators of mixing matrices in ICA. 



Figure 4: Boxplots of Amari errors obtained in M = 1000 replications of the setup (n, S), n = 100, 1, 000, S = J, K, L, for the 
preliminary L = Lp t>i, LhopCovi Lpicai Lxicai the one-step R-estimator L*(L), and the one-step _R+-estimator L*^(L) based 
on the same preliminaries with data-driven skew t- and Student's i-scores, respectively. 
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Figures [T]|6] below are providing boxplots for the M = 1,000 Amari distances associated 
with the various simulation setups. Since Amari distances are intrinsically nonnegative, these 
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are "one-sided boxplots", showing the first quartile, the median, the third quartile, and 
a 0.95 quantile whisker. Figures [TIE] are dealing with components densities (A)-(B)-(C), 
(D)-(E)-(F), (G)-(H)-(I), and (J)-(K)-(L), respectively. Figures (5]§ show the results for the 
T-step versions of the i?-estimators based on L HOPCov and L KIca , under components densi- 
ties (A)-(C)-(H), as described in (d) above. 
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Figure 5: Boxplots of Amari errors obtained in M = 1000 replications of the setup (n, S), n = 100, S = A, C, H, for the T-step 
i?-cstimator L*(L) based on preliminary L = LhopCov an d ^Kicai respectively, and data-driven skew i-scores, T = 1, . . . , 10. 

Inspection of Figures 1-4 reveals that Kernel-ICA is, almost uniformly, and sometimes 
quite substantially (see Figure 1, n = 1, 000 with 6- and 10-degress of freedom skew t- 
component densities, or Figure 3, n = 1, 000 under settings (H) and (I)), the best preliminary. 



32 





0.12_ 


Setup (n = 1000, A); Preliminary Est: HOPCov 


s 


0.09 




II 




< ,rf 


0.06 




o" « 




o 3. 


0.03. 


i 


&$ 


\ ! ; : • • : I : : 


3 n 

II 


0.00. 


- Iliiilili 



0.12 
0.09 
0.06 
0.03 

o.oo. e 



Setup (n = 1000, A); Preliminary Est: Kernel-ICA 



iliiiilllii 



S, p95: 0.119 0.027 0.009 0.007 0.007 0.007 p95: 0.019 0.012 0.007 0.006 0.007 0.006 



HOPCov 12 3 4 5 6 7 

T (# Iterations) 



9 10 



K-ICA 12 3 4 5 6 7 

T (# Iterations) 



9 10 



0.12. 



■=- 9 0.03. 

Q. y> 

% II 

°> 0.00 . 



Setup (n = 1000, C); Preliminary Est: HOPCov 



0.12. 



Setup (n = 1000, C); Preliminary Est: Kernel-ICA 



• I I I I I 

tXti 



0.03 . 



0.00. 



t t t i I i i i 



tl I I I I I I I f 

1 I ■ I I ■ I I I 

JLilillllll 



- p95: 0.628 0.506 0.118 0.048 0.043 0.042 p95: 0.051 0.041 0.041 0.041 

CT) I I I I I I I I I I I I I I I I I I 

9 10 



0.12. 



HOPCov 1 2 3 4 5 6 7 

T (# Iterations) 
Setup (n = 1000, H); Preliminary Est: HOPCov 



K-ICA 12 3 4 5 6 7 

T (# Iterations) 



0.041 0.041 

8 9 10 



0.06. 



II o 



■■ _ 



o> 0.00. 

I p95: 0.419 0.138 0.069 0.063 

D) I I I I I I I I 

HOPCov 12 3 4 5 6 7 

T (# Iterations) 



0.12. 


Setup (n 


= 1000, H); I 


D reliminao 


'Est: 


Kernel- 


-ICA 






0.09 






i 


1 I ! i 1 


j 


OOB 


1 


Mill 






T " 


' '" f T f t t J 


" f 


0.03. 


1 


1 1 1 1 1 


1 


0.00. 

















0.060 0.059 p95: 0.055 0.058 0.055 0.055 0.055 0.055 

8 9 10 K-ICA 1 23456789 10 

T (# Iterations) 



Figure 6: Boxplots of Amari errors obtained in M = 1000 replications of the setup (n, S), n = 1, 000, S = A, C, H, for the T-step 
R-cstimator L*(L) based on preliminary L = LhopCov and LKlca, respectively, and data-driven skew i-scores, T = 1, . . . , 10. 

Combined with /^-estimation (data driven skew t-scores), they are the typical winners, even 
under symmetric component densities where, in principle, i? + -estimators should do better. 
The best performances of /^-estimators seem to take place under heavy tails (Cauchy and sta- 
ble component densities) — thanks, probably, to the data-driven selection of scores. Based on 
FastICA or Kernel-ICA preliminaries, .R-estimators moreover are the only ones providing rea- 
sonably good results under the "mixed cases" of Figure 2 (bimodal/unimodal, stable/skew 
tio-, symmetric/skew t- component densities); note that partial symmetry (setup (F)) does 
not really help i? + -estimation much. 
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Figures 5 and 6 shows how iterating the rank-based correction can improve a poor pre- 
liminary. The HOPCov estimator is typically outperformed by the Kernel-ICA one; however, 
after a few iteration, both the HOPCov- and Kernel-ICA-based i?-estimator are performing 
quite similarly; the latter, however, needs less iterations than the former to reach its best 
performance. For n — 1, 000, starting from Kernel-ICA, one step is essentially sufficient. 

5 An application in image analysis 

The objective of ICA in applications is typically to recover source signals from a sequence 
of observed mixed signals. As such, they are widely used in a variety of contexts where 



the fundamental assumptions (1.1 )-( 1.2 ) of ICA are unlikely to hold. One of the merits of 
existing ICA such as FastICA and Kernel-ICA is that they resist reasonably well to such 
theoretically unwarranted applications. Such statements, of course, remain unavoidably 
vague: in the absence of a formal model, indeed, pertinent benchmarks for performance 
evaluation are hard to define. Demixing acoustic signals or images, where "readability" of 
the final result appears as an obvious criterion, are an exception. Therefore, in this section, 
we apply various ICA estimation methods, including the rank-based ones, to the demixing 
of images that clearly do not satisfy the assumptions we have been making throughout this 
paper. The results are shown in Figure [7j Their quality is best evaluated by eye-inspection, 
but a quantitative assessment can be made via the Amari distances provided in Table [8^, 
and b. Although traditional ICA techniques provide reasonable results, our rank-based 
techniques appear to bring quite significant improvements. 

A black-and-white digital image with resolution h x w (h,w 6 N) can be represented by 
a pixel matrix Z = (Z rs ) G [0, \} hxw y where Z TS represents the "greyness" of the pixel located 
in the rth row and sth column; if Z TS = 0, the pixel is pure black, and if Z rs = 1, the pixel is 
pure white. In this example, we mix three source images of US currency notes, represented 
by the pixel matrices Zj = (Zj ;rs ), j = 1,2,3 (h := 65 and w := 150). These three 
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source images are turned into three mixed ones, with pixel matrices Xj = (Xj. rs ), j = 1, 2, 3, 
where (X 1;rs , X 2 - s , X 3 . rs ) f = L*(Z 1;rs , Z 2;s , Z 3;rs )', with L* = I 3 +0.95(l 3 -I 3 ) e Ml (denoting 
by 1 3 a 3 x 3 matrix of ones); L* thus has a diagonal of ones, all off-diagonal enties being 0.95. 



The source and mixed images are displayed in Figure 7a 

We then performed ICA estimation on the n = 65 x 150 = 9, 750 three-dimensional 
observations (X 1;rs , X 2 . s , X 3 . rs ) by computing the multistep i?-estimators L? T .,(L) with data- 



driven skew t-scores (4.31) and preliminary estimators L = L Fobi , L FIca , and L KIca as described 
in (4.29), and T = 1, ...,20; the L HOPCov preliminary was omitted because symmetrizing 
the HOP scatter matrix (about 10 8 pairwise differences) was computationally too heavy. 



Figures 7b, 7c, and 7d contain the resulting L- and I/L^ (L)-demixed images. Of all pre- 



(20) * 



7d 



we 



liminary estimators considered, L KIca seems to provide the best results. In Figure 
therefore also provide the demixed images resulting from the Ilmonen and Paindaveine es- 
timator L* j™ (liKica) with kernel-ICA preliminary. Irrespective of the preliminary, there is 
a clear and quite significant visual enhancement, attributable to the use of ranks, in the 
i?-estimation method. Our R-estimators, moreover, substantially outperform the signed- 
rank ones. 

Those eye-inspection conclusions are confirmed and reinforced by the graphs in Figure [8j 
which reports the Amari errors AE \L1 T \ (L) , L*J (4.32) for the R- and -R + -estimators of L* 



and T = 0, . . . , 20. As T increases, for all multistep i?-estimators those errors appear to con- 
verge to some common limit independent of the preliminary L. For L = L FIca or L KIca , 
the decrease is quite significant over T = 1, . . . , 5. The same decrease is much slower 
for L = L Fobi , but the final result, as T gets close to 20, is the same, suggesting that 
rank-based corrections eventually compensate for a poorer performance of the preliminary 
estimator. 

The same Amari errors AE(L% t1 (L)), I/) were evaluated for the multistep (and data- 
driven-score) versions J/ + r T \ (L) of the Ilmonen and Paindaveine i? + -estimators. The results, 
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Figure 7: Figure |7a| contains the three source images and the three mixed ones. Figures |7b[ |7c| and |7d| show the demixed 
images obtained from multistep data-driven skew i-score i?-estimators, based on FOBI. FastICA, and Kernel-ICA preliminaries, 
respectively. In Figure |7d] the result of a Kernel-ICA-based, data-driven Student's i-score multistep i?+-estimator method are 
also provided. 

(a) Top row: the three source images. Bottom row: the three mixed images. 




(b) FOBI preliminary. Top row: the L Pobi -demixed images. Bottom row: the LT 20 s(L Fobi )-demixed images 




(c) FastICA preliminary. Top row: the L FIca -demixed images. Bottom row: the LL 0) (L PIca )-demixed images 




(d) Kernel-ICA preliminary. Top row: the L K i C a-demixed images. Middle row: the L! 20 s (L K i C a)-demixed im- 
ages. Bottom row: the L* , 20 ., (L K i C a)-demixed images. 









lifei. I i^^jSk 
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in Figure 8b, clearly show that the signed-rank method fails, which is hardly surprising, 
since there is little reason for "greyness" in the source images considered here to exhibit any 
symmetric behavior. 



Figure 8: The Amari errors AE(L, L*j for the multistep i?-estimators LJ™ (L) and the multistep R + - 
estimators L* , t n(L) shown in Figure 7 and based on the preliminary estimators L = L Fobi , L FIca , and L K i C a, 
for T= 1,...,20. U 
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A Appendix: Proofs 



A.l Proof of Proposition 2.1 



Oja et al. (2010) establish ULAN for ICA models under the assumption that each fj is 
symmetric. Their proof consists in showing that the sufficient conditions of Lemma 1 in 
Swensen (1985) are satisfied. Mutatis mutandis, that proof still goes through in the present 
case, with the same central sequence; only the information matrix is affected. That matrix 
depends on the covariance of vec(Tj™^j) under PJ£l ; /> "which takes the form 



E[vec(TW )vec(TM )'] 



k 



r,s,p,q=l 



4(</U<,/)>,,e' a 0e r e', 



Because (T^ ^ r) is a sum of i.i.d. random variables with expectation zero, 



i(n) 



H(TZ,f)r P ( T ¥Lf) J -^[^f r (Zx,r)Z 1 , p -S rp )( l p fs (Z hs )Z 1>q -8 sq )] r, S)M e{l,...,fc} 



where the Zij-'s are i.i.d. with density fj under PLl-/ an d $rp is the classical Kronecker index. 
Evaluating those expectations yields G/ defined in (2.5). □ 



A. 2 Proofs for Propositions 2.2 and 2.3 



Propositions |2.2[ i) and |2.3[ i) follow from Lemma A.l below, itself adapted from Theo- 
rem V.1.8 in Hajek and Sidak (1967). Consider a triangular array (U\ , V 1 ) , . . . , (Un, V„ ) . 
nGN and two scores (fu, tpv such that 



rW 



r(n) 



(Dl) U\ and V^ , i = 1, . . . , n, are uniform over [0, 1] and mutually independent, and 
(D2) ipu, ipv '■ (0, 1) — > K are square-integrable and satisfy (A5). 
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Denote by Rj the rank of U\ amongst U i , . . . , U- , by Q™ the rank of V^ amongst 
V± n \...M n \ and define 

a£\i) := E[<pu(ui n) )\RP = i], a^ pr (i) := M^y). 

&£>(i) := E[ W (F 1 ( " ) )|Q[" ) = i], and &W pr (z) := M^)- 

Assumption (D2) implies 

lim EL, (<.&(0 -*->)' = „ and lim Skl^ML = «, (A . 33 ) 

"^°° maxi<i< n (ai^r(i) - a (n) ) "^°° maxi<,< n (&S j (i) - Tp v ) 

Let 

^ ■= 4 E («£? $ b) )&2? (Q?°) - Vw) , (A.34) 

where </% := J </7^(w)dw and </Jy := f Q ip v (v)dv; note that 

it . n . 

t=i i=\ 

and, similarly, Tp v = - X^=i ^x (i)- Also define 

1 n 

'-'appr - — /— / j I a appr i/^i J "appr l^j J — fl appr"appr ) > (A.ODj 

V j=l 

where a£p pr := £ XT=i a appr (?) and 6 a " pr := £ ^™=i ^ppr (?) • The following Lemma shows 
that both S^ and Slppr admit the asymptotic representation 

Tin) := i E (M^fV(vf } ) - *#W) , (A.36) 

* i=l 

where # = \ J™ =1 Vu (U^) and # = I £" =1 W ft H ) • 

Lemma A.l. Let {u[ n \ VJ n) ),..., (L^ n) , K (n) ) and toe scores ^ ^y satwi/j/ (D1)-(D2). 
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Then, as n — >■ oo, 



(0 Sto=S«> + o I ?{l) and (ii) 5W =TW + 0La (l), (A.37) 



wit/i S'ex , •S'appr, and T^ defined in (A. 34), (A. 35), and (A. 36), respectively 



Proof. Let us show that 



HmE[(5W-5W) 2 ]=0 and (ii') lim E[(S£> - T^f] = 0; (A.38) 



while A37(i) is the same as (i'), A37[ ii) is a consequence of (z'), (iz') and the triangle 
inequality. 



r(n) 



(") U., o(») 



? (n) 



Defining the antirank of V^ with respect to C/j by Qi-* := {r : R r = i} (so 



lH ^(n) 



that -R ( i) = i), the sequence (Qi™ , • • • , Qrl-*) is uniformly distributed over {l, ...,n} in 



Q 



r(») 



in), 



view of the independence between the XJ\ s and the Vy s. Reordering terms, we have 



j=i 



(n) _ c (n) c (n) 



i=l 



Write S£>=S%+S%, where 






>(") 



>(») 



where R\ n J := {r : Qr = i} denotes the antirank of U\ n) with respect to V i . Assump- 



(n) 



r(n) 



tion (A5), (A. 33), Lemma V. 1.6a, and Theorem V. 1.6a from Hajek and Sidak (1967) together 
imply linin.j.oo E[(Slp pr — S*™) ] = and lin^oo E[(S^) ] = 0, which, along with the tri- 



angle inequality, establishes (i 1 ) in (A.38). 



Let U[™ } := (U$, . . . , U$)' and V ( ( ™ } := (V$, ..., V$)' denote the order statistics for 
the n-tuples {UJ; }™ =1 and {V^ }™ =1 , respectively. Because the antiranks R™1 are uniformly 
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distributed and independent of Kf \ . . . , Rn , the i?i" th order statistic U m (n), is uniformly 
distributed over the unit interval (the same is true for the Q^th. order statistic V (n ( n ),). 



Write rM = T^ + TfJ, where 



«&')' 



^ n ^ n 

T S ^iEWW-^^W and r # := 4TE(^(^)-4 n) )(^(V" ) ))- a -^>)) 

i=l i=l 



Then (A.33) and Theorem V.1.5a from Hajek and Sidak (1967) imply that 



HmE[{St ) -T^Y]=0 and limE[(T!')'h 0. 



-.("Ml 



which establishes (ii') in (A.38). 



□ 



>(n) 



Proof of Proposition^^ All expectations in this section are under P^ ^ r, unless otherwise 
specified; R 4 stands for R^ (L), i = 1, . . . ,n. For part (i) of the proposition to hold, it is 
sufficient that, for T^}. ex and tJJ in rt2.9fc and (|2.14|), 



-.(n) 



K/)™ = (TZfJrs + °*Q) for a11 r, a e {1, . . . , *}, as n -+ oo. 



(A.39) 



First, fix r ^ s G {1, . . . , fc}. Then, 



£&-)„ = -Is £ E K K') K"'] E [*r' K>) I* 



by independence between distinct components, and 



v «=i 



Letting 0j/ = Jy r and </v = F s l , (A.39) (for r ^ s) thus directly follows from Lemma Al. 



For r = s, the Hajek projection theorem for linear rank statistics and the convergence rate 
of Riemann sums imply 
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n 
»=1 



t "* / T / -^iV \ 7-T— 1 / -^VjV 

, ' n + 1 ) \ n+1 

»=i \ \ / \ 






n 2 

n 
t=i 



as n — )• oo, under P^Lf- This establishes part (i) of Proposition 2.2 As for part (ii), 
it follows from the results in Hallin and Werker (2003) that A^^ = A^j** + 0^2(1) 

as n — > 00, under Pj^ ,. This, along with part (i) of the proposition and the triangle 
inequality, implies part (ii). □ 



Proof of Proposition ^3 . In order to establish part (i) of the proposition, it is sufficient 
to show that, for every r ^ s e {!,..., k}, (Tgj,) r> = (T^J. fg ) rs + o&(l) as n ->■ 00, 



under P^. Let Vf } := G (zf>) =: (V#*>, . . . , V^)', i = l,...,n. The rank of V^ 
amongst V^ , . . . , V^ is i?|" (L) for each j — 1, . . . , k. The claim follows from Lemma 



A.l 



lj 1 ■ ■ ■ 1 v nj 10 ± L ij 

by taking score functions Jf r and F~ x . 

The proof for parts (ii) and (iii) follow from that of Theorem 3.2(h) and (iii) in Ilmonen 
and Paindaveine (2011). However, the presence of asymmetry in the independent components 
implies different cross-information matrices. The result is obtained, via Le Cam's Third 
Lemma, from an evaluation of the covariance matrix in the asymptotically normal joint 



o(n) 



>(») 



distribution of A L ... and (2.8) under P^ . That covariance matrix follows from the 



covariance of A^"., and A^ , under Pij n £ which depends on 



r,s,p,q=l 



Evaluating this expression eventually yields the value of G/ iS appearing in (2.15) for the 



cross-information matrix. □ 
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